Integrating pitch and localisation cues at a speech fragment level

نویسندگان

  • Heidi Christensen
  • Ning Ma
  • Stuart N. Wrigley
  • Jon Barker
چکیده

This paper proposes a novel speech-fragment based approach for processing binaural data to improve the estimation of speech source locations in reverberant, multi-speaker recordings. The technique employs two stages. First, a robust multipitch tracking algorithm is used to locate local spectro-temporal ‘speech fragments’ – regions where the energy in the mixture is dominated by a single speech source. Second, robust localisation estimates are formed by integrating interaural time difference cues over each speech fragment. The technique is applied to the analysis of more than five hours of two-party meetings that have been constructed from a mixture of binaural mannequin recordings. It is shown that estimating location at the speech fragment level produces better results than conventional location-estimate smoothing techniques leading to a an increase in relative frame accuracy rate of more than 35%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Binaural Cues for Fragment-Based Speech Recognition in Reverberant Multisource Environments

This paper addresses the problem of speech recognition using distant binaural microphones in reverberant multisource noise conditions. Our scheme employs a two stage fragment decoding approach: first spectro-temporal acoustic source fragments are identified using signal level cues, and second, a hypothesisdriven stage simultaneously searches for the most probable speech/background fragment labe...

متن کامل

A hearing-inspired approach for distant-microphone speech recognition in the presence of multiple sources

This paper addresses the problem of speech recognition in reverberant multisource noise conditions using distant binaural microphones. Our scheme employs a two-stage fragment decoding approach inspired by Bregman’s account of auditory scene analysis, in which innate primitive grouping ‘rules’ are balanced by the role of learnt schema-driven processes. First, the acoustic mixture is split into l...

متن کامل

Recent advances in fragment-based speech recognition in reverberant multisource environments

This paper addresses the problem of speech recognition using distant binaural microphones in reverberant multisource noise conditions. Our scheme employs a two stage fragment decoding approach: first spectro-temporal acoustic source fragments are identified using signal level cues, and second, a hypothesisdriven stage simultaneously searches for the most probable speech/background fragment labe...

متن کامل

The Function of Pitch Range Variations in Samples of Emotional Expressions in Persian

This study aims at investigating the interface between emotion and intonation patterns (more specifically, duration and pitch amplitude of speech). To this end, the acoustic properties of spectral parameters related to speech prosody are investigated. The results of acoustic and Statistical analysis show that mean level and range of FO in the contours vary strongly as a function of the degree o...

متن کامل

Acoustic analysis of lexical tone in Mandarin infant-directed speech.

Using Mandarin Chinese, a "tone language" in which the pitch contours of syllables differentiate words, the authors examined the acoustic modifications of infant-directed speech (IDS) at the syllable level to test 2 hypotheses: (a) the overall increase in pitch and intonation contour that occurs in IDS at the phrase level would not distort lexical pitch at the syllable level and (b) IDS provide...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007